--- title: keywords: sidebar: home_sidebar tags: summary: ---

vision.transform

vision.transform provides classes and functions to handle data augmentation in computer vision.

Data augmentation

If you want to quickly get a set of random transforms that have proved to work well in a wide range of tasks, you should use the get_transforms function. The most important parameters to adjust are do_flip and flip_vert, depending on the type of images you have.

get_transforms

(do_flip:bool=True, flip_vert:bool=False, max_rotate:float=10.0, max_zoom:float=1.1, max_lighting:float=0.2, max_warp:float=0.2, p_affine:float=0.75, p_lighting:float=0.75, xtra_tfms:float=None) -> Collection[Transform]

Utility func to easily create list of flip, rotate, zoom, warp, lighting transforms

  • do_flip: if True, a random flip is applied with probability 0.5
  • flip_vert: requires do_flip=True. If True, the image can be flipped vertically or rotated of 90 degrees, otherwise only an horizontal flip is applied
  • max_rotate: if not None, a random rotation between -max_rotate and max_rotate degrees is applied with probability p_affine
  • max_zoom: if not 1. or less, a random zoom betweem 1. and max_zoom is applied with probability p_affine
  • max_lighting: if not None, a random lightning and contrast change controlled by max_lighting is applied with probability p_lighting
  • max_warp: if not None, a random symmetric warp of magnitude between -max_warp and maw_warp is applied with probability p_affine
  • p_affine: the probability that each affine transform and symmetric warp is applied
  • p_lighting: the probability that each lighting transform is applied
  • xtra_tfms: a list of additional transforms you would like to be applied

This function returns a tuple of two list of transforms, one for the training set and the other for the validation set (which is limited to a center crop by default.

tfms = get_transforms(); len(tfms)
2

Here is the example image we will use to show the data augmentation.

def get_ex(): return open_image('imgs/cat_example.jpg')
get_ex().show()

Let's see how the defaults of get_transforms change this little kitten now.

tfms = get_transforms()
fig, axs = plt.subplots(2,4,figsize=(12,6))
for ax in axs.flatten():
    img = apply_tfms(tfms[0], open_image('imgs/cat_example.jpg'), size=224)
    img.show(ax=ax)

Another useful function that gives basic transforms is:

zoom_crop

(scale:float, do_rand:bool=False, p:float=1.0)

Randomly zoom and/or crop

  • scale: Ratio to which zoom the image
  • do_rand: If true, transform is randomized, otherwise it's a zoom of scale and a center crop
  • p: Probability to apply the zoom

scale should be a given float if do_rand is false, otherwise it can be a range of floats (and the zoom will have a random value inbetween). Again, here is a sense of what this can give us.

tfms = zoom_crop(scale=(1.,1.2), do_rand=True)
fig, axs = plt.subplots(2,4,figsize=(12,6))
for ax in axs.flatten():
    img = apply_tfms(tfms[0], get_ex(), size=224)
    img.show(ax=ax)
show_doc(rand_resize_crop, ignore_warn=True, arg_comments={
    'size': 'Final size of the image',
    'max_scale': 'Zooms the image to a random scale up to this',
    'ratios': 'Range of ratios in which a new one will be randomly picked'
})

rand_resize_crop

(size:int, max_scale:float=2.0, ratios:Tuple[float, float]=(0.75, 1.33))

Randomly resizes and crop the image to a ratio in ratios after a zoom of max_scale

  • size: Final size of the image
  • max_scale: Zooms the image to a random scale up to this
  • ratios: Range of ratios in which a new one will be randomly picked

This transforms determines a new width and height of the image after the random scale and squish to the new ratio are applied. Those are switched with probabilit 0.5, then we return the part of the image with the width and height computed centered in row_pct, col_pct if width and height are both less than the corresponding size of the image, otherwise we try again with new ranfom parameters.

tfm = rand_resize_crop(224)
_, axs = plt.subplots(2,4,figsize=(12,6))
for ax in axs.flatten():
    img = apply_tfms(tfm, get_ex(), size=224)
    img.show(ax=ax)

Randomness

The functions that define each transform, like rotateor flip_lr are deterministic. The fastai library will then randomize them in two different ways:

  • each transform can be defined with an argument named p representing the probability for it to be applied
  • each argument that is type-annoted with a random function (like uniform or rand_int) can be replaced by a tuple of arguments accepted by this function, and on each call of the transform, the argument that is passed inside the function will be picked randomly using that random function.

If we look at the function rotate for instance, we see it had an argument degrees that is type-annotated as uniform.

First level of randomness: We can define a transform using rotate with degrees fixed to a value, but by passing an argument p. The rotation will then be executed with a probability of p but always with the same value of degrees.

tfm = [rotate(degrees=30, p=0.5)]
fig, axs = plt.subplots(1,5,figsize=(12,4))
for ax in axs:
    img = apply_tfms(tfm, get_ex())
    title = 'Done' if tfm[0].do_run else 'Not done'
    img.show(ax=ax, title=title)

Second level of randomness: We can define a transform using rotate with degrees defined as a range, without an argument p. The rotation will then always be executed with a random value picked uniformly between the two floats we put in degrees.

tfm = [rotate(degrees=(-30,30))]
fig, axs = plt.subplots(1,5,figsize=(12,4))
for ax in axs:
    img = apply_tfms(tfm, get_ex())
    title = f"deg={tfm[0].resolved['degrees']:.1f}"
    img.show(ax=ax, title=title)

All combined: We can define a transform using rotate with degrees defined as a range, and an argument p. The rotation will then always be executed with a probability p and a random value picked uniformly between the two floats we put in degrees.

tfm = [rotate(degrees=(-30,30), p=0.75)]
fig, axs = plt.subplots(1,5,figsize=(12,4))
for ax in axs:
    img = apply_tfms(tfm, get_ex())
    title = f"Done, deg={tfm[0].resolved['degrees']:.1f}" if tfm[0].do_run else f'Not done'
    img.show(ax=ax, title=title)

List of transforms

Here is the list of all the deterministic functions on which the transforms are built. As explained before, each of those can have a probability p of being executed, and any time an argument is type-annotated with a random function, it's possible to randomize it via that function.

brightness

(x, change:uniform) -> Image :: TfmLighting

change brightness of image x

  • x: Image to transform
  • change: Value to adjust the brightness, should be between 0. and 1., 0.5 is neutral.

This transform adjusts the brightness of the image depending of the value in change. A change of 0 will transform the image in black and a change of 1 will transform the image to white. 0.5 doesn't do anything.

fig, axs = plt.subplots(1,5,figsize=(12,4))
for change, ax in zip(np.linspace(0.1,0.9,5), axs):
    brightness(get_ex(), change).show(ax=ax, title=f'change={change:.1f}')

contrast

(x, scale:log_uniform) -> Image :: TfmLighting

scale contrast of image x

  • x: Image to transform
  • scale: Value to adjust the contrast, should be a positive number (1. is neutral)

This adjusts the contrast depending of the value in scale. A scale of 0 will transform the image in grey and a very high scale will transform the picture in super-contrast. 1. doesn't do anything.

fig, axs = plt.subplots(1,5,figsize=(12,4))
for scale, ax in zip(np.exp(np.linspace(log(0.5),log(2),5)), axs):
    contrast(get_ex(), scale).show(ax=ax, title=f'scale={scale:.2f}')

crop

(x, size, row_pct:uniform=0.5, col_pct:uniform=0.5) -> Image :: TfmPixel

Crop x to size pixels. row_pct,col_pct select focal point of crop

  • x: Image to transform
  • size: Size of the crop, if it's an int, the crop will be square
  • row_pct: Between 0. and 1., position of the center on the y axis (0. is top, 1. is bottom, 0.5 is center)
  • col_pct: Between 0. and 1., position of the center on the x axis (0. is left, 1. is right, 0.5 is center)

This transform takes a crop of the image to return one of the given size. The position is given by (col_pct, row_pct), with col_pct and row_pct being normalized between 0. and 1.

fig, axs = plt.subplots(1,5,figsize=(12,4))
for center, ax in zip([[0.,0.], [0.,1.],[0.5,0.5],[1.,0.], [1.,1.]], axs):
    crop(get_ex(), 300, *center).show(ax=ax, title=f'center=({center[0]}, {center[1]})')

crop_pad

(x, size, padding_mode='reflection', row_pct:uniform=0.5, col_pct:uniform=0.5) -> Image :: TfmCrop

Crop and pad tfm - row_pct,col_pct sets focal point

  • x: Image to transform
  • size: Size of the crop, if it's an int, the crop will be square
  • padding_mode: How to pad the output image ('zeros', 'border' or 'reflection')
  • row_pct: Between 0. and 1., position of the center on the y axis (0. is top, 1. is bottom, 0.5 is center)
  • col_pct: Between 0. and 1., position of the center on the x axis (0. is left, 1. is right, 0.5 is center)

This works like crop but if the target size is bigger than the size of the image (on one or the other dimension), padding is applied according to padding_mode (see pad for an example of all the options) and the position of center is ignored on that dimension.

fig, axs = plt.subplots(1,5,figsize=(12,4))
for size, ax in zip(np.linspace(200,600,5), axs):
    crop_pad(get_ex(), int(size), 'zeros', 0.,0.).show(ax=ax, title=f'size = {int(size)}')

dihedral

(x, k:partial) -> Image :: TfmPixel

Randomly flip x image based on k

  • x: Image to transform
  • k: Integer between 0 and 7 that represents one of the 8 dihedral transformations possible

This transform applies one of all the transformations possible of the image by combining a flip (horizontal or vertical) and a rotation of a multiple of 90 degrees.

fig, axs = plt.subplots(2,4,figsize=(12,8))
for k, ax in enumerate(axs.flatten()):
    dihedral(get_ex(), k).show(ax=ax, title=f'k={k}')
plt.tight_layout()

flip_lr

(x) -> Image :: TfmPixel

  • x: Image to transform

This transform horizontally flips the image.

fig, axs = plt.subplots(1,2,figsize=(6,4))
get_ex().show(ax=axs[0], title=f'no flip')
flip_lr(get_ex()).show(ax=axs[1], title=f'flip')

jitter

(c, img_size, magnitude:uniform) -> Image :: TfmCoord

  • c: Coords to transform (automatically passed by the fastai pipeline)
  • img_size: Size of the image (automatically passed by the fastai pipeline)
  • magnitude: Strength of the jitter

This transform changes the pixels of the image by randomly replacing them with pixels from the neighborhood (how far we go is controlled by the value of magnitude).

fig, axs = plt.subplots(1,5,figsize=(12,4))
for magnitude, ax in zip(np.linspace(-0.05,0.05,5), axs):
    tfm = jitter(magnitude=magnitude)
    get_ex().jitter(magnitude).show(ax=ax, title=f'magnitude={magnitude:.2f}')

pad

(x, padding, mode='reflection') -> Image :: TfmPixel

Pad x with padding pixels. mode fills in space ('zeros','reflection','border')

  • x: Image to transform
  • padding: Padding to add on each side of the picture
  • mode: Padding mode (constant, reflect or replicate)

Pads the image by adding padding pixel on each side of the picture accordin to mode:

  • mode = zeros pads with zeros,
  • mode = border repeats the pixels at the border.
  • mode = reflection pads by taking the pixels symmetric to the border.
fig, axs = plt.subplots(1,3,figsize=(12,4))
for mode, ax in zip(['constant', 'reflect', 'replicate'], axs):
    pad(get_ex(), 50, mode).show(ax=ax, title=f'mode={mode}')

perspective_warp

(c, img_size, magnitude:partial=0) -> Image :: TfmCoord

Apply warp to c and with size img_size with magnitude amount

  • c: Coords to transform (automatically passed by the fastai pipeline)
  • img_size: Size of the image (automatically passed by the fastai pipeline)
  • magnitude: Vector of eight coordinates explaining how to transform each corner

Perspective wrapping is a deformation of the image as it was seen in a different plane of the 3D-plane. The new plane is determined by telling where we want each of the four corners of the image (from -1 to 1, -1 being left/top, 1 being right/bottom).

fig, axs = plt.subplots(2,4,figsize=(12,8))
for i, ax in enumerate(axs.flatten()):
    magnitudes = torch.tensor(np.zeros(8))
    magnitudes[i] = 0.5
    perspective_warp(get_ex(), magnitudes).show(ax=ax, title=f'coord {i}')

rotate

(degrees:uniform) -> Image :: TfmAffine

Affine func that rotates the image

  • degrees: Angle to use to rotate the image

Rotates the image by a certain amount of degrees.

fig, axs = plt.subplots(1,5,figsize=(12,4))
for deg, ax in zip(np.linspace(-60,60,5), axs):
    get_ex().rotate(degrees=deg).show(ax=ax, title=f'degrees={deg}')

skew

(c, img_size, direction:rand_int, magnitude:uniform=0) -> Image :: TfmCoord

Skew c field and resize toimg_size with random direction and magnitude

  • c: Coords to transform (automatically passed by the fastai pipeline)
  • img_size: Size of the image (automatically passed by the fastai pipeline)
  • direction: One of the eights skews possible
  • magnitude: Strength of the skew

Skews the image in a givendirection by a certain order of magnitude.

fig, axs = plt.subplots(2,4,figsize=(12,8))
for i, ax in enumerate(axs.flatten()):
    get_ex().skew(i, 0.2).show(ax=ax, title=f'direction={i}')

squish

(scale:uniform=1.0, row_pct:uniform=0.5, col_pct:uniform=0.5) -> Image :: TfmAffine

Squish image by scale. row_pct,col_pct select focal point of zoom

  • scale: Ratio to which squish the image
  • row_pct: Between 0. and 1., position of the center on the y axis (0. is top, 1. is bottom, 0.5 is center)
  • col_pct: Between 0. and 1., position of the center on the x axis (0. is left, 1. is right, 0.5 is center)

Squishes the image with the value in scale, with the center being given by row_pct,col_pct.

fig, axs = plt.subplots(1,5,figsize=(12,4))
for scale, ax in zip(np.linspace(0.66,1.33,5), axs):
    get_ex().squish(scale=scale).show(ax=ax, title=f'scale={scale:.2f}')

symmetric_warp

(c, img_size, magnitude:partial=0) -> Image :: TfmCoord

Apply warp to c with size img_size and magnitude amount

  • c: Coords to transform (automatically passed by the fastai pipeline)
  • img_size: Size of the image (automatically passed by the fastai pipeline)
  • magnitude: Vector of 4 coordinates for the strength in each corner

Apply the four tilts at the same time, each with a strength given in the vector magnitude. See tilt just below for the effect of each individual tilt.

tfm = symmetric_warp(magnitude=(-0.2,0.2))
_, axs = plt.subplots(2,4,figsize=(12,6))
for ax in axs.flatten():
    img = apply_tfms(tfm, get_ex(), padding_mode='zeros')
    img.show(ax=ax)

tilt

(c, img_size, direction:rand_int, magnitude:uniform=0) -> Image :: TfmCoord

Tilt c field and resize toimg_size with random direction and magnitude

  • c: Coords to transform (automatically passed by the fastai pipeline)
  • img_size: Size of the image (automatically passed by the fastai pipeline)
  • direction: Integer between 0 and 3
  • magnitude: Strength of the tilt

Tilts the image in the direction given (0: left, 1: right, 2: top, 3: bottom) with a certain magnitude. A positive number is a tilt forward (toward the person looking at the picture), a negative number a tilt backward.

fig, axs = plt.subplots(2,4,figsize=(12,8))
for i in range(4):
    get_ex().tilt(i, 0.4).show(ax=axs[0,i], title=f'direction={i}, fwd')
    get_ex().tilt(i, -0.4).show(ax=axs[1,i], title=f'direction={i}, bwd')

zoom

(scale:uniform=1.0, row_pct:uniform=0.5, col_pct:uniform=0.5) -> Image :: TfmAffine

Zoom image by scale. row_pct,col_pct select focal point of zoom

  • scale: Ratio to which zoom the image
  • row_pct: Between 0. and 1., position of the center on the y axis (0. is top, 1. is bottom, 0.5 is center)
  • col_pct: Between 0. and 1., position of the center on the x axis (0. is left, 1. is right, 0.5 is center)

Zooms the image with the value in scale, the center being given by row_pct,col_pct.

fig, axs = plt.subplots(1,5,figsize=(12,4))
for scale, ax in zip(np.linspace(1., 1.5,5), axs):
    get_ex().squish(scale=scale).show(ax=ax, title=f'scale={scale:.2f}')

zoom_squish

(c, img_size, scale:uniform=1.0, squish:uniform=1.0, invert:rand_bool=False, row_pct:uniform=0.5, col_pct:uniform=0.5) -> Image :: TfmCoord

  • c: Coords to transform (automatically passed by the fastai pipeline)
  • img_size: Size of the image (automatically passed by the fastai pipeline)
  • scale: Zooms the image by scale
  • squish: Factor by which squish the image
  • invert: Invert the image ratio
  • row_pct: Between 0. and 1., position of the center on the y axis (0. is top, 1. is bottom, 0.5 is center)
  • col_pct: Between 0. and 1., position of the center on the x axis (0. is left, 1. is right, 0.5 is center)

This transforms determines a new width and height of the image after the scale and squish are applied. Those are switched if invert is True, then we return the part of the image with the width and height computed centered in row_pct, col_pct if width and height are both less than the corresponding size of the image, otherwise we take a center crop. If an array of scale, squish, invert is passed, it tries the values in order and returns the first image where this condition is met.

Used in comination with crop, this imitates the RandomResizeCrop from torchvision:

tfm = [zoom_squish(scale=(1.,2.,8), squish=(0.75,1.33,8), invert=(0.5,8), row_pct=(0.,1.), col_pct=(0.,1.)), crop(size=224)]
_, axs = plt.subplots(2,4,figsize=(12,6))
for ax in axs.flatten():
    img = apply_tfms(tfm, get_ex(), size=224, padding_mode='zeros')
    img.show(ax=ax)
tfm
[RandTransform(tfm=TfmCoord (zoom_squish), kwargs={'scale': (1.0, 1.5, 8), 'squish': (0.75, 1.33, 8), 'invert': (0.5, 8), 'row_pct': (0.0, 1.0), 'col_pct': (0.0, 1.0)}, p=1.0, resolved={'scale': tensor([1.0301, 1.2952, 1.0241, 1.3212, 1.1982, 1.4005, 1.3084, 1.2037]), 'squish': tensor([1.1218, 0.9251, 1.1833, 1.2121, 0.8177, 0.8347, 1.1503, 0.9392]), 'invert': tensor([0, 0, 0, 0, 0, 0, 1, 1], dtype=torch.uint8), 'row_pct': 0.2807202826645152, 'col_pct': 0.878508450047249}, do_run=True, is_random=True),
 RandTransform(tfm=TfmPixel (crop), kwargs={'size': 224}, p=1.0, resolved={'size': 224, 'row_pct': 0.5, 'col_pct': 0.5}, do_run=True, is_random=True)]
s = 1/math.sqrt(1.2266)
r = math.sqrt(0.8186)
w,h = s/r, s*r
orig_ratio = 500/394
w /= orig_ratio
h *= orig_ratio
w,h
(0.786391602379616, 1.0367121120551188)
get_ex().size
torch.Size([500, 394])
get_ex()

Last random functions

rand_crop(args, kwargs)

Random crop and pad

Returns a randomized version of crop_pad.

tfm = rand_crop()
_, axs = plt.subplots(2,4,figsize=(12,6))
for ax in axs.flatten():
    img = apply_tfms(tfm, get_ex(), size=224)
    img.show(ax=ax)

rand_zoom(args, kwargs)

Random zoom tfm

Returns a randomized version of zoom.

tfm = rand_zoom(scale=(1.,1.5))
_, axs = plt.subplots(2,4,figsize=(12,6))
for ax in axs.flatten():
    img = apply_tfms(tfm, get_ex())
    img.show(ax=ax)